Goto

Collaborating Authors

 rashomon ratio




A Path to Simpler Models Starts With Noise

Neural Information Processing Systems

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.



Beyond the Single-Best Model: Rashomon Partial Dependence Profile for Trustworthy Explanations in AutoML

Cavus, Mustafa, van Rijn, Jan N., Biecek, Przemysław

arXiv.org Artificial Intelligence

Automated machine learning systems efficiently streamline model selection but often focus on a single best-performing model, overlooking explanation uncertainty--an essential concern in human-centered explainable AI. To address this, we propose a novel framework that incorporates model multiplicity into explanation generation by aggregating partial dependence profiles (PDP) from a set of near-optimal models, known as the Rashomon set. The resulting Rashomon PDP captures interpretive variability and highlights areas of disagreement, providing users with a richer, uncertainty-aware view of feature effects. To evaluate its usefulness, we introduce two quantitative metrics, the coverage rate and the mean width of confidence intervals, to evaluate the consistency between the standard PDP and the proposed Rashomon PDP. Experiments on 35 regression datasets from the OpenML-CTR23 benchmark suite show that in most of the cases, the Rashomon PDP covers less than 70% of the best model's PDP, underscoring the limitations of single-model explanations. Our findings suggest that Rashomon PDP improves the reliability and trustworthiness of model interpretations by adding additional information that would otherwise be neglected. This is particularly useful in high-stakes domains where transparency and confidence are critical.



A Path to Simpler Models Starts With Noise

Neural Information Processing Systems

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models.


On the Rashomon ratio of infinite hypothesis sets

Coupkova, Evzenie, Boutin, Mireille

arXiv.org Machine Learning

Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.


A Path to Simpler Models Starts With Noise

Semenova, Lesia, Chen, Harry, Parr, Ronald, Rudin, Cynthia

arXiv.org Machine Learning

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.


Navigating the Sea of Explainability - WebSystemer.no

#artificialintelligence

This article is coauthored by Joy Rimchala and Shir Meir Lador. Rapid adoption of complex machine learning (ML) models in recent years has brought with it a new challenge for today's companies: how to interpret, understand, and explain the reasoning behind these complex models' predictions. Treating complex ML systems as trustworthy black boxes without sanity checking has led to some disastrous outcomes, as evidenced by recent disclosures of gender and racial biases in GenderShades¹. As ML-assisted predictions integrate more deeply into high-stakes decision-making, such as medical diagnoses, recidivism risk prediction, loan approval processes, etc., knowing the root causes of an ML prediction becomes crucial. If we know that certain model predictions reflect bias and are not aligned with our best knowledge and societal values (such as an equal opportunity policy or outcome equity), we can detect these undesirable ML defects, prevent the deployment of such ML systems, and correct model defects.